Coping with Lexical Gaps when Building Aligned Multilingual Wordnets
نویسندگان
چکیده
In this paper we present a methodology for automatically classifying the translation equivalents of a machine readable bilingual dictionary in three main groups: lexical units, lexical gaps (that is cases when a lexical concept of a language does not have a correspondent in the other language) and translation equivalents that need to be manually classified as lexical units or lexical gaps. This preventive classification reduces the manual work necessary to cope with lexical gaps in the construction of aligned multilingual wordnets.
منابع مشابه
Wordnet creation and extension made simple: A multilingual lexicon-based approach using wiki resources
In this paper, we propose a simple methodology for building or extending wordnets using easily extractible lexical knowledge from Wiktionary and Wikipedia. This method relies on a large multilingual translation/synonym graph in many languages as well as synset-aligned wordnets. It guesses frequent and polysemous literals that are difficult to find using other methods by looking at back-translat...
متن کاملThe Automatic Mapping of Princeton WordNet Lexical-Conceptual Relations onto the Brazilian Portuguese WordNet Database
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980 ́s. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important...
متن کاملOntology-Based Word Sense Disambiguation in Parallel Corpora
Lately, there seems to be a growing acceptance of the idea that multilingual lexical ontologies might be the key towards aligning different views on the semantic atomic units to be used in characterizing the general meaning of various and multilingual documents. Comparing performances of word sense disambiguation systems is a difficult evaluation task when different sense inventories are used a...
متن کاملRomanian WordNet: New Developments and Applications
Among the existing ontologies, the multilingual lexical ontologies have a special status. Structured in a similar way to standard ontologies, the lexical ones are distinguished by the fundamental requirement that each conceptualized entity is lexicalized by one or more synonymous words (a synset) of the natural language vocabulary. Multilingually aligned wordnets, such as EuroWordNet or BalkaNe...
متن کاملUsing Multilingual Resources for Building SloWNet Faster
This project report presents the results of an approach in which synsets for Slovene wordnet were induced automatically from parallel corpora and already existing wordnets. First, multilingual lexicons were obtained from word-aligned corpora and compared to the wordnets in various languages in order to disambiguate lexicon entries. Then appropriate synset ids were attached to Slovene entries fr...
متن کامل